Method and a system for embedding textual forensic information

ABSTRACT

A method for automatically embedding information in a digital text, said method comprising: identifying a plurality of positions, in said digital text, that are suitable for introducing modifications into said digital text; identifying modifications suitable for introduction into at least some of said suitable positions in said digital text; selecting at least some of said identified modifications for introduction into said digital text, said selection of said modifications being operable to represent said information; and performing said selected modifications on said digital text, thereby to embed said information.

FIELD OF THE INVENTION

[0001] The present invention relates generally to the field of securingdigital content. More specifically, the present invention deals withforensic methods for breach analysis and business espionage mitigation.

BACKGROUND OF THE INVENTION

[0002] Modern businesses and industries relay heavily on digital contentas a primary mean of communication and documentation. Digital contentcan be easily copied and distributed (e.g., via e-mail, instantmessaging, peer-to-peer networks, FTP and web-sites), which greatlyincrease hazards such as business espionage and data leakage. There istherefore great interest in methods that would mitigate risks of digitalespionage and unauthorized dissemination of proprietary information.

[0003] In general, one can divide the counter digital espionage methodsinto two categories: proactive methods, that increase the difficulty ofunauthorized copying and distribution of digital documents, and reactivemethods, the latter providing means for detection and tracking ofbreached content, for forensic purposes and for tracking andincrimination of suspects, thereby to provide an effective deterrence.

[0004] Current attempts to automatically mitigate espionage are focusedon proactive methods. While these methods can be helpful in some cases,it is generally believed that any proactive method may be eventuallycircumvented, and there is a strong need to complement these methodswith reactive means, that provide for forensic evidence and a means forincrimination of suspects. An effective forensic measure should providean effective means to determine the exact source of a breached document.

[0005] In the context of secure distribution of multimedia content, someforensic methods require that unique, personalized digital watermarks,dubbed “fingerprint”, be embedded into each copy of the data before itis sent to the final user, allowing for binding of each copy with anauthorized and accountable user. Numerous methods for personalizedwatermarking of multimedia files, such as video and audio contents,exist: in these cases, there exists a high level of redundancy thatallows embedding of watermarks into the media, in a manner that will notreduce the quality of the media and yet will be robust to both maliciousand non-malicious attacks. Some methods for embedding steganograms(hidden messages) inside a text also exist, and can be traced back tofar antiquity. However, since the amount of redundancy in text is muchsmaller then the redundancy in audio or video, it is harder to embed ina robust manner such hidden messages in a text, in particular if theembedding process is to be done automatically, and current methods forautomatic embedding of steganograms in text are usually based onaltering the number of spaces in the end of line, which are highlyvulnerable to format changing.

[0006] In many cases, documents are prepared by groups, where eachmember of the group introduces his own modifications into a document. Anefficient document forensic system should consider this fact, and embedmodifications that are as robust as possible against casual editingwhile allowing for seamless group-working on copies that containsomewhat different versions of the documents.

[0007] Embedding steganograms into text is also important for copyrightprotection of digital books: Illegal copying and distribution of digitalbooks, also known as “e-books”, has been prevalent in recent years,especially using the Internet. This illegal copying and distribution isan infringement of copyright protection laws and cause financial damageto the rightful owners of the content. It is therefore of great interestto find methods that would stop or at least reduce illegal copyingand/or distribution of digital texts without offending rightful usage.To-date, no such method is in use.

[0008] Another important aspect of a forensic technique is robustness: aforensic method should be robust against consequential changes in thesubstance and preferably against deliberate attempts to remove theforensic marks. Current methods usually lack an adequate level ofrobustness.

[0009] Prior art regarding usage of forensic data for tracking breachesand espionage detection include the usage of manual insertion of smallmodifications in various copies of the document, as well as theinsertion of identification data in the meta-data of the binary file andaltering the number of spaces in the end of the lines of the text. Suchmethods do not provide an adequate solution to the problem of modembusinesses, since the rate of production of copies of digital documentsrenders the cost of manual insertion of modifications prohibitive, andthe plurality of formats in which the information can be representedrender metadata based methods ineffective, since file metadata is oftenaltered when the format of the file is changed.

[0010] There is thus a recognized need for, and it would be highlyadvantageous to have, a method and system that allow personalizedwatermarking of text in digital documents, which will overcome thedrawbacks of current methods as described above.

SUMMARY OF THE INVENTION

[0011] According to a first aspect of the present invention there isprovided a method for automatically embedding information in a digitaltext, the method comprising:

[0012] identifying a plurality of positions, in the digital text, thatare suitable for introducing modifications into the digital text;

[0013] identifying modifications suitable for introduction into at leastsome of the suitable positions in the digital text;

[0014] selecting at least some of the identified modifications forintroduction into the digital text, the selection of the modificationsbeing operable to represent the information; and

[0015] performing the selected modifications on the digital text,thereby to embed the information.

[0016] In a preferred embodiment of the present invention, the methodfurther comprises the approval of the selection of modifications in thedigital text.

[0017] In a preferred embodiment of the present invention, themodifications include at least one of the following:

[0018] replacing a character with a substantially similar lookingcharacter;

[0019] replacing a character with a similarly looking character, wherethe characters only differ in their digital representation;

[0020] replacing a character with a similarly looking character, wherethe characters only differ in their Unicode representation;

[0021] removing an unprintable character;

[0022] adding an unprintable character;

[0023] replacing an unprintable character;

[0024] exchanging between at least two possible representations of anend of a paragraph; and exchanging between at least two possiblerepresentations of an end of a line.

[0025] In a preferred embodiment of the present invention, themodifications include at least one of the following:

[0026] modifying the number of spaces between words;

[0027] modifying the number of spaces between paragraphs;

[0028] modifying the number of spaces between lines;

[0029] modifying the number of spaces at a line ending;

[0030] modifying the number of tabs at a line ending;

[0031] adding at least one space character at a line ending;

[0032] adding at least one tab character at a line ending;

[0033] modifying the size of spaces between words;

[0034] modifying the size of spaces between paragraphs;

[0035] modifying the size of spaces between lines;

[0036] modifying the size of spaces between characters;

[0037] modifying the number of spaces representing a tab character;

[0038] modifying the place of a tab;

[0039] replacing a tab character with at least one space;

[0040] replacing at least one space with a tab character; and modifyingthe size of a tab character.

[0041] In a preferred embodiment of the present invention, themodifications include at least one of the following:

[0042] modifying the font of at least one character;

[0043] modifying the color of at least one character;

[0044] modifying the size of at least one character;

[0045] modifying a property of at least one character;

[0046] modifying the background of the digital text;

[0047] modifying the background of at least one character;

[0048] replacing a character with an image similar to the character;

[0049] modifying the digital representation of the digital content;

[0050] modifying the internal logical division in the digitalrepresentation of the digital content;

[0051] modifying the classification of a unit in the internal logicaldivision in the digital representation of the digital content;

[0052] modifying a property of a unit in the internal logical divisionin the digital representation of the digital content;

[0053] modifying the classification of a paragraph; and modifying aproperty of a paragraph.

[0054] In a preferred embodiment of the present invention, themodifications include at least one of the following:

[0055] punctuation modifications;

[0056] spelling modifications;

[0057] spelling modifications that exchange between different validspellings of the same word; and spelling modifications that exchangebetween at least one valid spelling of the a word and at least oneinvalid spelling of the word.

[0058] In a preferred embodiment of the present invention, themodifications include at least one of the following:

[0059] exchanging between some of the following versions of a word builtfrom at least two words: a concatenated version, a version that uses ahyphen for separation and a version separated by a space;

[0060] spelling modifications that exchange between an acronym and fullverbatim versions of the acronym;

[0061] spelling modifications that exchange between at least oneshortened version of a word and the full version of the word; exchangingbetween a correct version of a word and at least one other word, theother words have similar pronunciation to the correct word;

[0062] exchanges between synonyms;

[0063] modifications that effect an order of elements within the digitaltext;

[0064] modifications that effect an order of words;

[0065] modifications that effect an order of sentences; andmodifications that effect an order of paragraphs.

[0066] In a preferred embodiment of the present invention, themodifications include at least one of the following:

[0067] modifications that effect capitalization;

[0068] removing at least one word;

[0069] adding at least one word;

[0070] replacing at least one word;

[0071] modifications to diagrams embedded in the digital text;

[0072] addition of diagrams embedded in the digital text;

[0073] removal of diagrams embedded in the digital text;

[0074] modifications to the shadow of at least one character;

[0075] exchanging between at least two different grammatical structures;and modifying the phrasing of at least a part of the digital text suchthat the changed version remains similar to the original version.

[0076] In a preferred embodiment of the present invention, theidentification of modifications is performed in a manner which takesinto consideration limitations imposed by the digital representation ofthe digital text.

[0077] In a preferred embodiment of the present invention, the embeddedinformation contains information suitable to identify at least one entryin a database, the database entry containing additional information.

[0078] In a preferred embodiment of the present invention, the embeddedinformation contains information operable to identify at least onerecipient of the digital text.

[0079] In a preferred embodiment of the present invention, the methodfurther comprises the step of selecting different combinations of themodifications to form different copies of the digital text such that aplurality of recipients of the digital text each receive a personallymodified version of the digital text, the different combinations withinthe embedded information being operable to uniquely identify arespective recipient of each copy.

[0080] In a preferred embodiment of the present invention, the embeddedinformation contains information operable to identify at least oneeditor of the digital text.

[0081] In a preferred embodiment of the present invention, the methodfurther comprises automatically performing the step of identifyingpositions in the digital text.

[0082] In a preferred embodiment of the present invention, the step ofidentifying positions in the digital text, is performed manually.

[0083] In a preferred embodiment of the present invention, the step ofidentifying positions in the digital text, is performed such that thepositions are distributed in a predefined manner within the digitaltext.

[0084] In a preferred embodiment of the present invention, thepredefined manner of distribution of the positions within the digitaltext is a distribution where all portions of the digital text largerthan a given size contain enough embedded information to reconstruct apredetermined subset of the embedded information.

[0085] In a preferred embodiment of the present invention, the desirablemanner of distribution of the positions within the digital text is adistribution defined such that removal of a significant number of thepositions from the digital text results in significant degradation ofthe value of the digital text.

[0086] In a preferred embodiment of the present invention, at least partof the embedded information is encoded using at least one of thefollowing:

[0087] error detection code;

[0088] error correction code;

[0089] cryptographic signature; and

[0090] cryptographic encryption.

[0091] In a preferred embodiment of the present invention, theidentification of suitable modifications is performed in a manner whichtakes into account the limitations imposed by requirements concerningthe quality of the digital text and on the resemblance of the modifiedtext to the original version of the digital text.

[0092] In a preferred embodiment of the present invention, the selectionof the identified modifications is performed so that at least twopotential modifications are grouped together, and where several versionsof the digital text are produced with different embedded information,the group of changes being performed in unison, such that if amodification which is part of the group is performed on one version ofthe text, all other modifications in the group are also performed on theversion.

[0093] In a preferred embodiment of the present invention, themodifications in the group are in proximity to each other within thedigital text.

[0094] In a preferred embodiment of the present invention, the selectionof modifications is performed such as to take into account the amount ofinformation which is to be embedded in the digital text.

[0095] In a preferred embodiment of the present invention, the amount ofinformation which is to be embedded in the digital text is dictated byat least one of the following considerations:

[0096] the amount of actual information which needs to be represented bythe information embedded in the digital text;

[0097] the usage of error correction code;

[0098] the usage of error detection code;

[0099] the requirements on robustness;

[0100] the required number of different versions of the digital text;

[0101] the need to embed a database index; and

[0102] the need to embed versioning information.

[0103] In a preferred embodiment of the present invention, the embeddedinformation contains at least one of the following:

[0104] versioning information;

[0105] editing history information;

[0106] forensics information;

[0107] transfer history information; and

[0108] information operable to identify and categorize the digital text.

[0109] In a preferred embodiment of the present invention, the embeddedinformation is substantially imperceptible.

[0110] According to a second aspect of the present invention there isprovided A method for monitoring digital text by utilizing informationembedded in digital texts, the method comprising:

[0111] embedding information in digital texts it is desired to monitor;

[0112] detecting an attempt to use a specific digital text;

[0113] determining whether the specific digital text contains theembedded information;

[0114] determining whether the specific digital text is one of thedigital texts it is desired to monitor according to the embeddedinformation; and

[0115] reading the information embedded in the specific digital text.

[0116] In a preferred embodiment of the present invention, the embeddedinformation is operable to identify the source of the digital text whenthe digital text is found in at least one of the following states:

[0117] in the possession of an unauthorized party;

[0118] in an unauthorized location;

[0119] in an unsecured location; and in an unsecured format.

[0120] In a preferred embodiment of the present invention, the embeddedinformation is further operable to identify at least part of the path inwhich the digital text reached the state.

[0121] In a preferred embodiment of the present invention, the methodfurther comprises controlling the usage of the digital text according tothe embedded information.

[0122] In a preferred embodiment of the present invention, the embeddedinformation contains at least one limitation about the usage of thedigital text.

[0123] In a preferred embodiment of the present invention, thelimitations comprising at least one of the following:

[0124] limitations about the time in which it is allowable to use thedigital text;

[0125] limitations about where it is allowable to use the digital text;

[0126] limitations about how it is allowable to use the digital text;and

[0127] limitations about who is allowed to use the digital text.

[0128] In a preferred embodiment of the present invention, thecontrolling is dependent on at least one of the following:

[0129] the identity of the user performing the usage;

[0130] the usage rights of the user performing the usage;

[0131] the identity of the digital text;

[0132] the risks associated with the usage;

[0133] the security mechanisms used in the usage; and

[0134] the type of usage.

[0135] In a preferred embodiment of the present invention, thelimitations on how the text is used comprise limitations to at least oneof the following:

[0136] viewing the digital text;

[0137] editing the digital text;

[0138] transferring the digital text; and

[0139] storing the digital text.

[0140] There is also provided in accordance to a prefered embodiment ofthe present invention A system for controlling usage of a digital textby utilizing information embedded in digital text the system comprising:

[0141] at least one computerized information embedding unit operable toembed the information in the digital texts;

[0142] at least one computerized information reading unit operable toread the information embedded in the digital texts;

[0143] at least one computerized digital text usage unit operable to usethe digital texts; and

[0144] at least one computerized control unit operable to:

[0145] receive notification from the computerized digital text usageunit, the notification indicating the digital text;

[0146] receive information from the computerized information readingunit, the information dependent on the information embedded in thedigital text and read by the computerized information reading unit; and

[0147] instruct the computerized digital text usage unit on a usagepolicy for the digital text, the usage policy dependent on theinformation received from the computerized information reading unit.

[0148] In a preferred embodiment of the present invention, the embeddedinformation is operable to identify the source of the digital text whenthe digital text is found in the possession of an unauthorized party.

[0149] In a preferred embodiment of the present invention, the systemfurther comprises at least one database containing at least one entrycontaining additional information, and where the embedded information isoperable to be correlated to the entry.

[0150] In a preferred embodiment of the present invention, the systemfurther comprises at least one computerized document management unitoperable to maintain information about digital texts.

[0151] In a preferred embodiment of the present invention, thecomputerized document management unit is operable to maintain at leastone of the following types of information:

[0152] versioning information;

[0153] editing history information;

[0154] usage policy information;

[0155] transfer history information; and

[0156] category information.

[0157] In a preferred embodiment of the present invention, thecomputerized document management system is operable to interact with thecomputerized control unit.

[0158] In a preferred embodiment of the present invention, theinteraction comprises at least one of the following:

[0159] the computerized control unit informing the computerized documentmanagement unit about usage of the digital text; and

[0160] the computerized document management unit sending information tothe computerized control unit, the information sent operable to be usedby the computerized control unit to create the usage policy.

[0161] In a preferred embodiment of the present invention, the usagepolicy comprises at least one of the following:

[0162] preventing the usage;

[0163] restricting the usage;

[0164] monitoring the usage;

[0165] reporting the usage; and

[0166] allowing the usage.

[0167] In a preferred embodiment of the present invention, the usagepolicy depends on at least one of the following:

[0168] the identity of the user performing the usage;

[0169] the usage rights of the user performing the usage;

[0170] the identity of the digital text;

[0171] the identity of the editors of the version of the digital textused in the usage;

[0172] the risks associated with the usage;

[0173] the security mechanisms used in the usage; and

[0174] the type of usage.

[0175] In a preferred embodiment of the present invention, the usagecomprises at least one of the following:

[0176] viewing the digital text;

[0177] editing the digital text;

[0178] transferring the digital text; and

[0179] storing the digital text.

[0180] In a preferred embodiment of the present invention, the embeddedinformation contains first indication information, the first indicationinformation indicating at least one element in a group, and where theembedded information further contains second indication information, thesecond indication information indicating the group.

[0181] In a preferred embodiment of the present invention, the embeddedinformation contains a plurality of information elements, and where asubset of the information elements are embedded into the digital textsuch that the subset of the information elements is encoded in a mannermore resilient to a change in the digital text than the embedding ofanother subset of the information elements.

[0182] In a preferred embodiment of the present invention, the systemfurther comprises a computerized transformer unit operable to receive aversion of a digital text, the version contains both editing changes andembedded information, and where the computerized transformer unit isfurther operable to produce a version of the digital text which containsboth the editing changes and different embedded information.

BRIEF DESCRIPTION OF THE DRAWINGS

[0183] The invention is herein described, by way of example only, withreference to the accompanying drawings. With specific reference now tothe drawings in detail, it is stressed that the particulars shown are byway of example and for purposes of illustrative discussion of thepreferred embodiments of the present invention only, and are presentedin the cause of providing what is believed to be the most useful andreadily understood description of the principles and conceptual aspectsof the invention. In this regard, no attempt is made to show structuraldetails of the invention in more detail than is necessary for afundamental understanding of the invention, the description taken withthe drawings making apparent to those skilled in the art how the severalforms of the invention may be embodied in practice.

[0184] In the drawings:

[0185]FIG. 1 is a flow-chart showing the sequence of steps for theinsertion of forensic information in digital textual document,constructed and operative in accordance with a preferred embodiment ofthe present invention;

[0186]FIG. 2 is a flow-chart showing the sequence of steps for creationof personalized text documents, constructed and operative in accordancewith a preferred embodiment of the present invention;

[0187]FIG. 3 is an illustration of a simplified pre-versioning system,constructed and operative in accordance with a preferred embodiment ofthe present invention;

[0188]FIG. 4 is a flow-chart showing the sequence of steps for embeddinghidden messages into a digital textual document, constructed andoperative in accordance with a preferred embodiment of the presentinvention;

[0189]FIG. 5 is a flow-chart showing the sequence steps for marking andpre-encryption of a set of data segments, constructed and operative inaccordance with a preferred embodiment of the present invention,

[0190]FIG. 6 is a simplified block-diagram describing group working onpersonalized documents, as part of a preferred embodiment of the presentinvention;

[0191]FIG. 7 is a simplified block diagram that represents the functionof the version generator, in accordance with a preferred embodiment ofthe present invention;

[0192]FIG. 8, is a simplified diagram showing a hidden informationreading unit, constructed and operative according to a preferredembodiment of the present invention, and

[0193]FIG. 9 is a simplified diagram illustrating a digital text usagecontrol system, constructed and operative according to a preferredembodiment of the present invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0194] The present invention seeks to provide a system and a method foron-line, real-time personalized marking of digital content, with anemphasis on text, in order to allow tracking and detection of sources ofleaks and breaches of confidential and proprietary information, therebymitigating the hazards of digital espionage and unauthorizeddissemination of proprietary information. The system and the methods canalso be used as a part of a digital rights management system.

[0195] According to a first aspect of the present invention, a methodbased on distributing a preferably unique copy to each of therecipients, thereby allowing tracing and detecting the sources ofbreaches, is described. In a preferred embodiment of the inventedmethod, a technique for maintaining the coherency and integrity of thepersonalized documents while working in groups is also described.

[0196] Before explaining at least one embodiment of the invention indetail, it is to be understood that the invention is not limited in itsapplication to the details of construction and the arrangement of thecomponents set forth in the following description or illustrated in thedrawings. The invention is capable of other embodiments or of beingpracticed or carried out in various ways. In addition, it is to beunderstood that the phraseology and terminology employed herein is forthe purpose of description and should not be regarded as limiting.

[0197] Reference is first made to FIG. 1, which is a simplifiedflowchart of the basic steps in practicing a preferred embodiment of thepresent invention: The original document or text is presented to thesystem (stage A, as indicated by 110) and undergoes an automaticversioning phase in which several personalized versions of the originaldocument or text are created, based on modifying elements of the text orthe document. (stage B, as indicated by 120). For each of the versions aversion descriptor is created (stage C, as indicated by 130). Theversion descriptor and corresponding recipient are then inserted to adatabase (stage D, as indicated by 140) and the personalized versionsare then distributed to the various recipients (stage E, as indicated by150).

[0198] Some examples for modifying techniques operable for versioningare:

[0199] Punctuation: additional/missing comas, replacing commas “,” withsemi-colons “;” and vice versa, concatenation of sentences, usage of “,which” versus “that”, usage of parentheses instead of commas andvice-versa etc.

[0200] Spelling: if there is more then one way to spell a word (e.g.,color/colour, can not /cannot, foreign words, names, etc.) then such aword is a candidate for modifying.

[0201] Exact synonyms, i.e., words that can be replaced with other wordswithout causing appreciable change (e.g., “for example” instead of“e.g.”).

[0202] Altering the number or size of spaces between words, lines andcharacters.

[0203] Altering some properties of some of the fonts.

[0204] Deliberate typos, especially in homophonic words.

[0205] Rephrasing of sentences and sub-sentences.

[0206] Rephrasing of paragraphs.

[0207] Capitalization (e.g. after“:”)

[0208] Additional words.

[0209] Replacing a character with a substantially similar lookingcharacter;

[0210] Replacing a character with a similarly looking character, whereinsaid characters only differ in their digital representation;

[0211] Replacing a character with a similarly looking character, whereinsaid characters only differ in their Unicode representation;

[0212] Removing an unprintable character;

[0213] Adding an unprintable character;

[0214] Replacing an unprintable character;

[0215] Exchanging between possible representations at an end of aparagraph;

[0216] Exchanging between possible representations at an end of a line;

[0217] Modifying the number of spaces between paragraphs;

[0218] Modifying the number of spaces at a line ending;

[0219] Modifying the number of tabs at a line ending;

[0220] Adding a space character at a line ending;

[0221] Adding a tab character at a line ending;

[0222] Modifying the size of spaces between paragraphs;

[0223] Modifying the size of spaces between lines;

[0224] Modifying the number of spaces representing a tab character;

[0225] Modifying the place of a tab;

[0226] Replacing a tab character with at least one space;

[0227] Replacing a space with a tab character;

[0228] Modifying the size of a tab character;

[0229] Modifying the font of a character;

[0230] Modifying the color of a character;

[0231] Modifying the size of a character;

[0232] Modifying a property of a character;

[0233] Modifying the background of the digital text;

[0234] Modifying the background of a character;

[0235] Replacing a character with an image similar to a character;

[0236] Modifying the digital representation of the digital content;

[0237] Modifying the internal logical division in the digitalrepresentation of the digital content;

[0238] Modifying the classification of a unit in the internal logicaldivision in the digital representation of the digital content;

[0239] Modifying a property of a unit in the internal logical divisionin the digital representation of the digital content;

[0240] Modifying the classification of a paragraph;

[0241] Modifying a property of a paragraph:

[0242] Exchanging between some of the following:

[0243] versions of a word built from at least two words:

[0244] a concatenated version,

[0245] a version that uses a hyphen for separation, and

[0246] a version separated by a space;

[0247] Spelling modifications that exchange between an acronym and afull verbatim versions of said acronym;

[0248] Spelling modifications that exchange between at least oneshortened version of a word and the full version of said word;

[0249] Modifications that exchange between a correct version of a wordand at least one other word, the other words having similarpronunciation to the correct word;

[0250] Exchange between synonyms;

[0251] Modifications that effect order of elements within said digitaltext;

[0252] Modifications that effect the order of words;

[0253] Modifications that effect the order of sentences;

[0254] Modifications that effect the order of paragraphs;

[0255] Modifications that effect capitalization;

[0256] Removing a word;

[0257] Adding a word;

[0258] Replacing a word;

[0259] Modifications to diagrams embedded in the digital text;

[0260] Addition of diagrams embedded in the digital text;

[0261] Removal of diagrams embedded in the digital text;

[0262] Modifications to the shadow of a character;

[0263] Exchanging between different grammatical structures;

[0264] Modifying the phrasing of a part of the digital text such thatthe changed version retains its similarity to the original version.

[0265] The position of potential candidates for modifying can beperformed either manually or by using specialized software.

[0266] In another aspect of the present invention, another level ofmarking can be added, by using watermarks on the background of the text,and in particular, the portion of the background behind words.

[0267] In general, not all the modifying process operable for versioningwould have the same merit: for example, deliberate typos reduce thequality of the document and are susceptible to spelling correction.Altering some properties of fonts and size of spaces between charactersmay not be robust against format changing etc. One can therefore definestrength, or robustness parameter to each modification, as well as aquality factor that will define to what extent the modifying processreduces the quality of the content.

[0268]FIG. 2 illustrate a flowchart of the process of preparing versionsof various segments, according to a preferred embodiment of the presentinvention. At the first step, candidates for modifying are located(stage A, as indicated by 210), after that, two or more modifications ofeach of the segments is produced, e.g., using one or more of the methodsdescribed above or the more extensive list of versioning techniquesdescribed elsewhere in this disclosure. (stage B, as indicated by 220).The modifications are preferably undergone a stage of approval, eithermanually (e.g., by the author of the text) and/or automatically (e.g.,by another software component). The stage of approval is indicated asstage C, as indicated by reference numeral 230 in FIG. 2). Each of theapproved modifications is then identified by a modification identifier(stage D, as indicated by 240) and is stored in a library on a storagedevice (stage E, as indicated by 250).

[0269] Reference is now made to FIG. 3, which illustrates a process inwhich a set of modifications of a certain position is constructed andstored according to a preferred embodiment of the present invention. Theposition denoted by B, indicated by 304, is used by the modifyingsubsystem 308 in order to produce the modifications together with thecorresponding identifier and descriptor: modification B1, indicated by310, modification B2, indicated by 312 and modification B3, indicated by314. The modifications, together with the corresponding identifier anddescriptor are then stored in the storage device 316 for future usage.

[0270] The modifying process can also be done by grouping togetherseveral optional modifications into one set of logical symbols. Thecardinality of this set is the product of the number of modifications ineach optional position. E.g., if, within the group, there are fourpossible modifications for punctuation, three possible synonyms for agiven word and two possible spellings, then there are total of 4*3*2b=24possible modifications in the group. If we assign a logical symbol toeach version, then the cardinality of the set of symbols is 24.

[0271] Grouping of optional modifications may also be based on theirorder within the text. In this case, the content can be divided intosegments, and the possible modifications within each segment may begrouped together to form a set of logical symbols. Each symbol in a setfor a given segment is unique from each other symbol in the set. Sets ofpre-versioned data segments associated with different segments of thesalient fraction may, but are not required to, contain segments with thesame symbols. That is, each set contains an “alphabet” of logicalsymbols that may or may not be the same alphabet as symbols containedwithin other sets associated with other segments. For example, a setassociated with a first data segment may contain logical symbols “A”,“B”and “C,” while a set associated with a second segment may containsymbols “C”, “1” and “3”. All the sets of pre-encrypted data segmentsare referred to as a library.

[0272] In general, it is advantageous to be able to identify a versionedcopy based on a small portion of the text. In order to achieve thatgoal, the modifications between copies should be distributed along thetext as uniformly as possible.

[0273] As content is prepared for distribution to an authorized useraccording to the present embodiments, a unique copy of the content,which is preferably correlated with some aspects of the details ofauthorized user, is produced. The unique content is preferably producedby selecting a specific sequence of modifications of the variouspositions. Denoting the j-th modification of the i-th modification byV(i,j), a personalized version is created by selecting the sequenceV(1,k₁), V(2,k₂), V(3,k₃), V(4,k₄) . . . , where the sequence k1,k₂, . .. , which determines which modification in each position is selected,provides a unique characterization of the personalized copy. The desireddocument may then be produced by inserting the corresponding version ofeach segment in the appropriate position.

[0274] The method may also be used to robustly embed other (notnecessarily unique) information.

[0275] Turning now to FIG. 4, there is shown a block diagram of thesteps for preparing a text to on-line version system that allows aseries of uniquely identifiable individual versions of a text to beproduced, distributed and then uniquely identified. At the first stage(stage A, as indicated by 410), the number of required copies, N, isdefined. At the next stage (stage B, as indicated by 420), an optimizedscheme for creation of N sequences of modifications is evaluated. Ingeneral, an optimal scheme would be such that the N copies are as remoteas possible from one another, i.e., that it would be as hard as possibleto make one personalized version indistinguishable from another, in thesense that the number of modifications, weighted by the robustnessfactor is maximal, while keeping the quality of the versions as high aspossible. Such a notion of an optimal scheme is known from the domain oferror-correcting code. The optimization process may be based onexhaustive search or on a more structured search process in thecombinatorial space.

[0276] After defining the optimal scheme, N different copies, with Ndifferent sequences of modifications are produced (staged C, indicatedas 430). To each of the personalized version an indicator is attached,that may be correlated with some details of the recipients (staged D,indicated as 440). The copies are then distributed to the variousrecipients (stage E, indicated as 450) and the list of recipients,together with the corresponding descriptors, are stored in a databasefor further usage (stage F, indicated as 460). Such further usage mayfor example include identifying the source of a version that wasdistributed without an authorization and the like.

[0277]FIG. 5 schematically illustrates a document system for managingthe creation and distribution of individualized versions of documents,which is referred to hereinafter as system 500. According to theconfiguration illustrated in FIG. 5, System 500 includes a versiongenerator 510, which is preferably monitored by the document systeminterface 520. The original text created by the original text creator530, is sent to the version generator 510, which produces versionedcopies 540, such that any recipient may obtain a different version ofthe document. The version generator also sends the descriptors of thevarious versions to the database 560. The version handler 550 obtainsinformation that characterizes the differences between the variousversions and the original text. The database 560 obtains the versiondescriptors and the correlations between versions and recipients, inorder to allow tracking and detection of the breached documents.

[0278] The version handler 540 handles cases in which versioned textdocuments are transferred between recipients and/or to the originalcreator. The version handler compares the versions of the sender and therecipient, and modifies the sender's version accordingly, therebyallowing seamless group work on the document. In another preferredembodiment of the present invention the information is embedded in acryptographic format (encrypted and/or signed) thereby preventingcertain harmful scenarios, such as framing of an innocent user. Thisencryption and/or signing should be made to the data before using anykind of error correction encoding, since otherwise the error correctioncode may be rendered ineffective.

[0279] Note that when using a database, embedding may be done in advanceand the database entry may be updated after a pre embedded copy isallocated to a certain recipient.

[0280] Reference is now made to FIG. 6, which is a simplified scheme ofa preferred embodiment of the version handler 540, which allows groupworking on versioned documents using document-handling system 500. Thesender 610, who whishes to send his working version 620 to a recipient630 with working version 640, sends his working copy to the comparator670 and the transformer 680. The comparator 670 compares the versionedtext 620 with the reference version of the text 690 in order to locatethe modifications that characterized the sender version, and which stillremain after the edit changes in the document that the sender mightintroduce while working on his version of the document. The transformer680 preferably uses data from the database 660 and the comparator 670 inorder to transform the personalization scheme of the sender to apersonalization scheme of the recipient, in a transparent or seamlessmanner. This is implemented by first removing the specific personalizedmodifications that were introduced by the version generator and whichmay still remain in the sender working version, and then producing themodifications to characterize the recipient copy which would have stillremained in the working version of the sender had they been there in thebeginning.

[0281] Note that if the original personalization scheme was renderedineffective due to substantial changes in the original text that awriter introduces in his/her copy, then the changed text itself maycontain a sufficient level of differences, which enables theidentification of the copy.

[0282] An alternative approach may consist of taking advantage of thefact that changes to the text are usually localized. This can either bedone by using a specialized error correction code designed forcorrecting localized errors, or by embedding a simple error detectioncode on localized chunks of data (e.g. paragraphs), and verifying thembefore extraction of the embedded information (preferring the errorlesschunks for extraction) A prior (and in many cases alternative) step maybe to look for similarities between chunks in order to know what is theorigin of chunks in order to ease the practice of verifying the chunks.

[0283] In order to reduce the ability of malicious tampering byrecipients, it may be beneficial to embed personalized information foreach subgroup of recipients or to some of those subgroups, where theembedding of information for said subgroups should be independent,instead of embedding personalized information on each copy for eachrecipient. Thus if a subgroup of recipients attempts to remove thespecific information for its members by comparing their respectivecopies, and attempting to remove the information identified asdifferences, they still can be identified by the subgroup's information,which will be identical in all their copies. In certain cases, embedpersonalized information for each (proper or otherwise) subgroup ofrecipients or to some of those subgroups (the embedding of informationfor said subgroups should be independent) personalized information maybecome redundant; because an individual recipient may be uniquelyidentified by the intersection of the subgroups, she (or he) is memberof.

[0284] Note that some attacks on the content may consist of canonizingthe text in some manner, thus it is of great benefit to embed thewatermark independently using a number of methods, or with an errorcorrection code that is designed to handle a complete removal of allinformation encoded using some of the methods. Thereby create enoughredundancy in order to mitigate most canonizing attacks.

[0285] Turning now to FIG. 7, there is illustrated a block diagram thatrepresents the function of the version generator, in accordance with apreferred embodiment of the present invention. The version generator 510of the document-handling system 500 gets as inputs the original text,the required number of versions, the minimal distance between versionsand the allowed depth of versioning, where “deeper versioning” refer tomore substantial modifications in the text. The policy manager 720provides rules regarding which modifications require an approval fromthe creator or an authorized party (e.g. operator, administrator). If anapproval is required, the user interface 730 prompt the user with asuggestion for modifications and asks for approval. The data storage 740contains all the approved modifications that can be used for versioning.The total possible number of personalized copies is the product of thenumber of modifications of each optional position. E.g., if, within aparagraph there are four possible modifications for punctuation, threepossible synonyms for a given word and two possible spellings of anothergiven word, then there are total of 4*3*2=24 possible versions. In orderto provide for a sufficient level of redundancy, which is needed forerror correction and robustness, the total number of possible versionsshould be significantly larger then the required number of versions,such that between any two different users, the minimal number ofmodifications would exceed a certain threshold value Θ, which may beprovided by the user or an authorized party (e.g. operator,administrator). If the total number of possible versions issignificantly larger then the required number of versions, then it isprobably sufficient to create the various versions by randomly selectedbetween the possible modifications using the random selector 750 andchecking afterwards that the minimal distance is indeed larger then Θusing the testing module 760. Otherwise one can use one of the numerouserror-correction codes available. The modifications that characterizeeach version are stored in the database 770.

[0286] It is important to note that the aforementioned level is not alinear scale, but rather a set of allowed methods and restrictions forusing those methods (e.g. no more than 2 typos in a paragraph).

[0287] Note that the impact of modifications may be application, orcontext depended—e.g., modifications in punctuation in a source code ofa computer program may affect the result of its compilation and maycause it to cease functioning altogether—e.g. by causing a syntax error.

[0288] It is also important to note, that in some applications there maynot be as many degrees of freedom as needed to satisfy the setconstraints, which may result in either changing or reducing constraints(automatically, manually or a combination of both), or a failure toembed all the necessary data (either embedding partial information, ornone at all). An implementation may need to address this issue accordingto the specific application in question (e.g. to fail the wholeversioning process, then denying access to the text or alerting anoperator that changes to the configuration need to be made).

[0289] Also, it is noted that in general, specific handling of versionsof specialized types of text (e.g. poems and sonnets, code of specificprogramming languages, spreadsheet data, a combination of severaldomains, etc.) may need both classification of the type of the text, andspecialized parsing in order to identify changeable positions.Classification of the type of the text may also be needed in order toemploy the correct policy for handling the content

[0290] Turning now to FIG. 8, there is illustrated a hidden informationreading unit 800, constructed and operative according to a preferredembodiment of the present invention. The document reader 810 reads theanalyzed document and the document identifier 820 attempts to identifythe document (e.g., using file meta-data or based on the textual contentof the document), preferably using the data in the database 830. If thedocument was found to be one on which hidden information is embedded,then the modifications detector 840 goes over all the positions on whichtwo or more modifications were embedded and attempts to detect whichversion was embedded. The results are then sent to the maximumlikelihood estimator 850, which estimates the likelihood of the mostprobable sequences of modifications that comprise the hiddeninformation. This is especially important in cases where the documenthas undergone substantial changes due to editing and/or maliciousattacks. The decision unit 860 use the likelihood information in orderto decide which hidden information is embedded in the analyzed document,and possibly also to determine the personalized version that is mostlikely to be the source of the analyzed document. The output from thereader is provided in the form of embedded information.

[0291] Turning now to FIG. 9, there is illustrated a digital text usagecontrol system 900, constructed and operative according to a preferredembodiment of the present invention. The embedded information-readingunit 800 reads digital text 910. Usage control unit 920 obtainsinformation from the information reading unit 800 and determinespermitted usage of the digital text 910. The permitted usage istypically one or more of the following: viewing the digital text,editing the digital text, transferring the digital text and storing thedigital text. The usage control unit 920 then instructs the digital textusage unit 930 whether to allow a requested usage 940.

[0292] Other limitations may include the following: limitations aboutthe time in which it is allowable to use the digital text; limitationsabout where it is allowable to use the digital text; limitations abouthow it is allowable to use the digital text; and limitations about whois allowed to use the digital text.

[0293] The usage limitations may be contingent on any one of a number offactors including the following: the identity of the user; usage rightsgranted to the user; the identity or nature of the digital text; therisks associated with the usage; the security mechanisms used involvedin using the text; and the type of usage that is being attempted. Thus,for example very different usage regimes are likely where the mainconcern is copyright violation or where the main concern is the leakingout of commercially sensitive information or of sensitive securityinformation.

[0294] In another embodiment of the present invention, the informationis embedded in the text in a manner that does not require actual use ofthe original document or of any other reference document in order toread the embedded information In the watermark embedding literature,this method is referred to as an oblivious reading. To illustrate theimplementation of such a method, one may consider each occurrence of“that” being replaced by “which” or vice versa, as a place in which abit is embedded, and consider an occurrence of “that” in this positionas “1” and an occurrence of “which” as “0”. The message is encoded usingan error-detection code and an error-correction code, so that only avery small fraction of the possible strings of zeros and ones arelegitimate. While reading, the reader renders a string of ones andzeros. If the string is legitimate, then it is assumed that the detectedmessage was indeed embedded in the text. Thus the investigation oflegitimacy is carried out without reference to another version. Notethat oblivious methods are, by nature, less robust then non-obliviousmethods. These methods enable avoiding or at least reducing usage ofdatabases and are especially useful when embedding is done in adistributed manner without the ability to contact a central database. Analternative approach is to use a distributed scheme where multipledatabases are used, and where the embedded information also contains theindex of the database.

[0295] In another embodiment of the present invention, the embeddedinformation is used as a reactive measure for copyright protection ofdigital books (“e-books”) and other copyrighted textual content. Theembedded information can be used as forensic measure in order to tracean authorized user that distributes textual content in an unauthorizedmanner, thereby providing an effective deterrence against unauthorizeddistribution.

[0296] It is appreciated that one or more steps of any of the methodsdescribed herein may be implemented in a different order than thatshown, while not departing from the spirit and scope of the invention.

[0297] While the present invention may or may not have been describedwith reference to specific hardware or software, the present inventionhas been described in a manner sufficient to enable persons havingordinary skill in the art to readily adapt commercially availablehardware and software as may be needed to reduce any of the embodimentsof the present invention to practice without undue experimentation andusing conventional techniques.

[0298] While the present invention has been described with reference toone or more specific embodiments, the description is intended to beillustrative of the invention as a whole and is not to be construed aslimiting the invention to the embodiments shown. It is appreciated thatvarious modifications may occur to those skilled in the art that, whilenot specifically shown herein, are nevertheless within the true spiritand scope of the invention.

We claim:
 1. A method for automatically embedding information in adigital text, said method comprising: identifying a plurality ofpositions, in said digital text, that are suitable for introducingmodifications into said digital text; identifying modifications suitablefor introduction into at least some of said suitable positions in saiddigital text; selecting at least some of said identified modificationsfor introduction into said digital text, said selection of saidmodifications being operable to represent said information; andperforming said selected modifications on said digital text, thereby toembed said information.
 2. A method according to claim 1, wherein saidmethod further comprises the approval of said selection of modificationsin said digital text.
 3. A method according to claim 1, wherein saidmodifications include at least one of the following: replacing acharacter with a substantially similar looking character; replacing acharacter with a similarly looking character, wherein said charactersonly differ in their digital representation; replacing a character witha similarly looking character, wherein said characters only differ intheir Unicode representation; removing an unprintable character; addingan unprintable character; replacing an unprintable character; exchangingbetween at least two possible representations of an end of a paragraph;and exchanging between at least two possible representations of an endof a line.
 4. A method according to claim 3, wherein said modificationsinclude at least one of the following: modifying the number of spacesbetween words; modifying the number of spaces between paragraphs;modifying the number of spaces between lines; modifying the number ofspaces at a line ending; modifying the number of tabs at a line ending;adding at least one space character at a line ending; adding at leastone tab character at a line ending; modifying the size of spaces betweenwords; modifying the size of spaces between paragraphs; modifying thesize of spaces between lines; modifying the size of spaces betweencharacters; modifying the number of spaces representing a tab character;modifying the place of a tab; replacing a tab character with at leastone space; replacing at least one space with a tab character; andmodifying the size of a tab character.
 5. A method according to claim 1,wherein said modifications include at least one of the following:modifying the font of at least one character; modifying the color of atleast one character; modifying the size of at least one character;modifying a property of at least one character; modifying the backgroundof said digital text; modifying the background of at least onecharacter; replacing a character with an image similar to saidcharacter; modifying the digital representation of said digital content;modifying the internal logical division in the digital representation ofsaid digital content; modifying the classification of a unit in theinternal logical division in the digital representation of said digitalcontent; modifying a property of a unit in the internal logical divisionin the digital representation of said digital content; modifying theclassification of a paragraph; and modifying a property of a paragraph.6. A method according to claim 1, wherein said modifications include atleast one of the following: punctuation modifications; spellingmodifications; spelling modifications that exchange between differentvalid spellings of the same word; and spelling modifications thatexchange between at least one valid spelling of the a word and at leastone invalid spelling of said word.
 7. A method according to claim 1,wherein said modifications include at least one of the following:exchanging between some of the following versions of a word built fromat least two words: a concatenated version, a version that uses a hyphenfor separation and a version separated by a space; spellingmodifications that exchange between an acronym and full verbatimversions of said acronym; spelling modifications that exchange betweenat least one shortened version of a word and the full version of saidword; exchanging between a correct version of a word and at least oneother word, said other words have similar pronunciation to said correctword; exchanges between synonyms; modifications that effect an order ofelements within said digital text; modifications that effect an order ofwords; modifications that effect an order of sentences; andmodifications that effect an order of paragraphs.
 8. A method accordingto claim 1, wherein said modifications include at least one of thefollowing: modifications that effect capitalization; removing at leastone word; adding at least one word; replacing at least one word;modifications to diagrams embedded in said digital text; addition ofdiagrams embedded in said digital text; removal of diagrams embedded insaid digital text; modifications to the shadow of at least onecharacter; exchanging between at least two different grammaticalstructures; and modifying the phrasing of at least a part of saiddigital text such that the changed version remains similar to theoriginal version.
 9. A method according to claim 1, wherein saididentification of modifications is performed in a manner which takesinto consideration limitations imposed by the digital representation ofsaid digital text.
 10. A method according to claim 1, wherein saidembedded information contains information suitable to identify at leastone entry in a database, said database entry containing additionalinformation.
 11. A method according to claim 1, wherein said embeddedinformation contains information operable to identify at least onerecipient of said digital text.
 12. A method according to claim 11,comprising the step of selecting different combinations of saidmodifications to form different copies of said digital text such that aplurality of recipients of said digital text each receive a personallymodified version of said digital text, said different combinationswithin said embedded information being operable to uniquely identify arespective recipient of each copy.
 13. A method according to claim 1,wherein said embedded information contains information operable toidentify at least one editor of said digital text.
 14. A methodaccording to claim 1, comprising automatically performing said step ofidentifying positions in said digital text.
 15. A method according toclaim 1, wherein said step of identifying positions in said digitaltext, is performed manually.
 16. A method according to claim 1, whereinsaid step of identifying positions in said digital text, is performedsuch that said positions are distributed in a predefined manner withinsaid digital text.
 17. A method according to claim 16, wherein saidpredefined manner of distribution of said positions within said digitaltext is a distribution wherein all portions of said digital text largerthan a given size contain enough embedded information to reconstruct apredetermined subset of said embedded information.
 18. A methodaccording to claim 16, wherein said desirable manner of distribution ofsaid positions within said digital text is a distribution defined suchthat removal of a significant number of said positions from said digitaltext results in significant degradation of the value of said digitaltext.
 19. A method according to claim 1, wherein at least part of saidembedded information is encoded using at least one of the following:error detection code; error correction code; cryptographic signature;and cryptographic encryption.
 20. A method according to claim 1, whereinsaid identification of suitable modifications is performed in a mannerwhich takes into account the limitations imposed by requirementsconcerning the quality of said digital text and on the resemblance ofsaid modified text to the original version of said digital text.
 21. Amethod according to claim 1, wherein said selection of said identifiedmodifications is performed so that at least two potential modificationsare grouped together, and wherein several versions of said digital textare produced with different embedded information, said group of changesbeing performed in unison, such that if a modification which is part ofsaid group is performed on one version of said text, all othermodifications in said group are also performed on said version.
 22. Amethod according to claim 21, wherein said modifications in said groupare in proximity to each other within said digital text.
 23. A methodaccording to claim 1, wherein said selection of modifications isperformed such as to take into account the amount of information whichis to be embedded in said digital text.
 24. A method according to claim23, wherein said amount of information which is to be embedded in saiddigital text is dictated by at least one of the followingconsiderations: the amount of actual information which needs to berepresented by said information embedded in said digital text; the usageof error correction code; the usage of error detection code; therequirements on robustness; the required number of different versions ofsaid digital text; the need to embed a database index; and the need toembed versioning information.
 25. A method according to claim 1, whereinsaid embedded information contains at least one of the following:versioning information; editing history information; forensicsinformation; transfer history information; and information operable toidentify and categorize said digital text.
 26. A method according toclaim 3, wherein said embedded information is substantiallyimperceptible.
 27. A method for monitoring digital text by utilizinginformation embedded in digital texts, said method comprising: embeddinginformation in digital texts it is desired to monitor; detecting anattempt to use a specific digital text; determining whether saidspecific digital text contains said embedded information; determiningwhether said specific digital text is one of said digital texts it isdesired to monitor according to said embedded information; and readingsaid information embedded in said specific digital text.
 28. A methodaccording to claim 27, wherein said embedded information is operable toidentify the source of said digital text when said digital text is foundin at least one of the following states: in the possession of anunauthorized party; in an unauthorized location; in an unsecuredlocation; and in an unsecured format.
 29. A method according to claim28, wherein said embedded information is further operable to identify atleast part of the path in which said digital text reached said state.30. A method according to claim 27, wherein said method furthercomprises controlling the usage of said digital text according to saidembedded information.
 31. A method according to claim 30, wherein saidembedded information contains at least one limitation about the usage ofsaid digital text.
 32. A method according to claim 31, wherein saidlimitations comprising at least one of the following: limitations aboutthe time in which it is allowable to use said digital text; limitationsabout where it is allowable to use said digital text; limitations abouthow it is allowable to use said digital text; and limitations about whois allowed to use said digital text.
 33. A method according to claim 32,wherein said controlling is dependent on at least one of the following:the identity of the user performing said usage; the usage rights of theuser performing said usage; the identity of said digital text; the risksassociated with said usage; the security mechanisms used in said usage;and the type of usage.
 34. A method according to claim 32, wherein saidlimitations on how said text is used comprise limitations to at leastone of the following: viewing said digital text; editing said digitaltext; transferring said digital text; and storing said digital text. 35.A system for controlling usage of a digital text by utilizinginformation embedded in digital text[?s], said system comprising: atleast one computerized information embedding unit operable to embed saidinformation in said digital texts; at least one computerized informationreading unit operable to read said information embedded in said digitaltexts; at least one computerized digital text usage unit operable to usesaid digital texts; and at least one computerized control unit operableto: receive notification from said computerized digital text usage unit,said notification indicating said digital text; receive information fromsaid computerized information reading unit, said information dependenton said information embedded in said digital text and read by saidcomputerized information reading unit; and instruct said computerizeddigital text usage unit on a usage policy for said digital text, saidusage policy dependent on said information received from saidcomputerized information reading unit.
 36. A system according to claim35, wherein said embedded information is operable to identify the sourceof said digital text when said digital text is found in the possessionof an unauthorized party.
 37. A system according to claim 35, whereinsaid system further comprises at least one database containing at leastone entry containing additional information, and wherein said embeddedinformation is operable to be correlated to said entry.
 38. A systemaccording to claim 35, wherein said system further comprises at leastone computerized document management unit operable to maintaininformation about digital texts.
 39. A system according to claim 38,wherein said computerized document management unit is operable tomaintain at least one of the following types of information: versioninginformation; editing history information; usage policy information;transfer history information; and category information.
 40. A systemaccording to claim 38, wherein said computerized document managementsystem is operable to interact with said computerized control unit. 41.A system according to claim 40, wherein said interaction comprises atleast one of the following: said computerized control unit informingsaid computerized document management unit about usage of said digitaltext; and said computerized document management unit sending informationto said computerized control unit, said information sent operable to beused by said computerized control unit to create said usage policy. 42.A system according to claim 35, wherein said usage policy comprises atleast one of the following: preventing said usage; restricting saidusage; monitoring said usage; reporting said usage; and allowing saidusage.
 43. A system according to claim 35, wherein said usage policydepends on at least one of the following: the identity of the userperforming said usage; the usage rights of the user performing saidusage; the identity of said digital text; the identity of the editors ofthe version of said digital text used in said usage; the risksassociated with said usage; the security mechanisms used in said usage;and the type of usage.
 44. A system according to claim 35, wherein saidusage comprises at least one of the following: viewing said digitaltext; editing said digital text; transferring said digital text; andstoring said digital text.
 45. A system according to claim 35, whereinsaid embedded information contains first indication information, saidfirst indication information indicating at least one element in a group,and wherein said embedded information further contains second indicationinformation, said second indication information indicating said group.46. A system according to claim 35, wherein said embedded informationcontains a plurality of information elements, and wherein a subset ofsaid information elements are embedded into said digital text such thatsaid subset of said information elements is encoded in a manner moreresilient to a change in said digital text than the embedding of anothersubset of said information elements.
 47. A system according to claim 35,wherein said system further comprises a computerized transformer unitoperable to receive a version of a digital text, said version containsboth editing changes and embedded information, and wherein saidcomputerized transformer unit is further operable to produce a versionof said digital text which contains both said editing changes anddifferent embedded information.